Language Extensions and Compilation Techniques for Data Intensive Computations
نویسندگان
چکیده
Processing and analyzing large volumes of data plays an increasingly important role in many domains of scienti c research. Typical examples of very large scienti c datasets include long running simulations of time-dependent phenomena that periodically generate snapshots of their state, archives of raw and processed remote sensing data, and archives of medical images. High-level language and compiler support for developing applications that analyze and process such datasets has, however, been lacking so far. We are developing language extensions and a compilation framework for expressing the applications that process large multidimensional datasets in a high-level data-parallel fashion. We have chosen a dialect of Java for expressing these applications. Our dialect of Java includes data-parallel extensions for specifying collection of objects, a parallel for loop, and reduction variables. Our compiler will analyze parallel loops and optimize the processing of datasets through the use of an existing runtime system, called Active Data Repository (ADR), developed at University of Maryland. We present design of a compiler/runtime interface which allows the compiler to e ectively utilize the existing runtime system. We show how interprocedural static program slicing can be used by the compiler to extract relevant information for the runtime system. Implementation of these compiler techniques is currently underway using the Titanium infrastructure.
منابع مشابه
High Level Programming Methodologies for Data Intensive Computations
Solving problems that have large computational and storage requirements is becoming increasingly critical for advances in many domains of science and engineering. By allowing algorithms for such problems to be programmed in widely used or rapidly emerging high-level paradigms, like object-oriented and declarative programming models, rapid prototyping and easy development of computational techni...
متن کاملOptimization of query evaluation for multidimensional raster databases
Many interpreted languages suffer from having higher processing times mostly due to the overhead introduced by the ”virtual machine” abstraction layer. A typical situation where interpreters are much slower than compiled programs is when complex computations are needed. JIT (just-in-time) compilation techniques proved very successful in solving this problem however not many languages implement ...
متن کاملHPF-2 Support for Dynamic Sparse Computations
There is a class of sparse matrix computations, such as direct solvers of systems of linear equations, that change the fill-in (nonzero entries) of the coefficient matrix, and involve row and column operations (pivoting). This paper addresses the problem of the parallelization of these sparse computations from the point of view of the parallel language and the compiler. Dynamic data structures ...
متن کاملVienna-Fortran/HPF Extensions for Sparse and Irregular Problems and Their Compilation
Vienna Fortran, High Performance Fortran (HPF), and other data parallel languages have been introduced to allow the programming of massively parallel distributed-memory machines (DMMP) at a relatively high level of abstraction, based on the SPMD paradigm. Their main features include directives to express the distribution of data and computations across the processors of a machine. In this paper...
متن کاملA Backend Extension Mechanism for PQL/Java with Free Run-Time Optimisation
In many data processing tasks, declarative query programming offers substantial benefit over manual data analysis: the query processors found in declarative systems can use powerful algorithms such as query planning to choose high-level execution strategies during compilation. However, the principal downside of such languages is that their primitives must be carefully curated, to allow the quer...
متن کامل